486 research outputs found
Encoder and Decoder Side Global and Local Motion Estimation for Distributed Video Coding
International audienceIn this paper, we propose a new Distributed Video Coding (DVC) architecture where motion estimation is performed both at the encoder and decoder, effectively combining global and local motion models. We show that the proposed approach improves significantly the quality of Side Information (SI), especially for sequences with complex motion patterns. In turn, it leads to rate-distortion gains of up to 1 dB when compared to the state-of-the-art DISCOVER DVC codec
New Light Field Image Dataset
Recently, an emerging light field imaging technology, which enables capturing full light information in a scene, has gained a lot of interest. To design, develop, implement, and test novel algorithms in light field image processing and compression, the availability of suitable light field image datasets is essential. In this paper, a publicly available light field image dataset is introduced and described in details. The proposed dataset contains 118 light field images captured by using a Lytro Illum light field camera. Based on their content, acquired light field images were classified into ten different categories with various features covering wide range of potential usage, such as image compression and quality evaluation
Assessment Framework for Deepfake Detection in Real-world Situations
Detecting digital face manipulation in images and video has attracted
extensive attention due to the potential risk to public trust. To counteract
the malicious usage of such techniques, deep learning-based deepfake detection
methods have been employed and have exhibited remarkable performance. However,
the performance of such detectors is often assessed on related benchmarks that
hardly reflect real-world situations. For example, the impact of various image
and video processing operations and typical workflow distortions on detection
accuracy has not been systematically measured. In this paper, a more reliable
assessment framework is proposed to evaluate the performance of learning-based
deepfake detectors in more realistic settings. To the best of our
acknowledgment, it is the first systematic assessment approach for deepfake
detectors that not only reports the general performance under real-world
conditions but also quantitatively measures their robustness toward different
processing operations. To demonstrate the effectiveness and usage of the
framework, extensive experiments and detailed analysis of three popular
deepfake detection methods are further presented in this paper. In addition, a
stochastic degradation-based data augmentation method driven by realistic
processing operations is designed, which significantly improves the robustness
of deepfake detectors
Explanation of Face Recognition via Saliency Maps
Despite the significant progress in face recognition in the past years, they
are often treated as "black boxes" and have been criticized for lacking
explainability. It becomes increasingly important to understand the
characteristics and decisions of deep face recognition systems to make them
more acceptable to the public. Explainable face recognition (XFR) refers to the
problem of interpreting why the recognition model matches a probe face with one
identity over others. Recent studies have explored use of visual saliency maps
as an explanation, but they often lack a deeper analysis in the context of face
recognition. This paper starts by proposing a rigorous definition of
explainable face recognition (XFR) which focuses on the decision-making process
of the deep recognition model. Following the new definition, a similarity-based
RISE algorithm (S-RISE) is then introduced to produce high-quality visual
saliency maps. Furthermore, an evaluation approach is proposed to
systematically validate the reliability and accuracy of general visual
saliency-based XFR methods
Cross-resolution Face Recognition via Identity-Preserving Network and Knowledge Distillation
Cross-resolution face recognition has become a challenging problem for modern
deep face recognition systems. It aims at matching a low-resolution probe image
with high-resolution gallery images registered in a database. Existing methods
mainly leverage prior information from high-resolution images by either
reconstructing facial details with super-resolution techniques or learning a
unified feature space. To address this challenge, this paper proposes a new
approach that enforces the network to focus on the discriminative information
stored in the low-frequency components of a low-resolution image. A
cross-resolution knowledge distillation paradigm is first employed as the
learning framework. Then, an identity-preserving network, WaveResNet, and a
wavelet similarity loss are designed to capture low-frequency details and boost
performance. Finally, an image degradation model is conceived to simulate more
realistic low-resolution training data. Consequently, extensive experimental
results show that the proposed method consistently outperforms the baseline
model and other state-of-the-art methods across a variety of image resolutions
Quality of Multimedia Experience: Past, Present and Future
This talk starts by defining what is Quality of Experience. It then provides an overview of state of the art in Quality of Experience in multimedia systems. It will finally conclude by presenting challenges and trends that need to be further addressed
The JPEG2000 still image compression standard
The development of standards (emerging and established) by the International Organization for Standardization (ISO), the International Telecommunications Union (ITU), and the International Electrotechnical Commission (IEC) for audio, image, and video, for both transmission and storage, has led to worldwide activity in developing hardware and software systems and products applicable to a number of diverse disciplines [7], [22], [23], [55], [56], [73]. Although the standards implicitly address the basic encoding operations, there is freedom and flexibility in the actual design and development of devices. This is because only the syntax and semantics of the bit stream for decoding are specified by standards, their main objective being the compatibility and interoperability among the systems (hardware/software) manufactured by different companies. There is, thus, much room for innovation and ingenuity. Since the mid 1980s, members from both the ITU and the ISO have been working together to establish a joint international standard for the compression of grayscale and color still images. This effort has been known as JPEG, the Join
Discriminative Deep Feature Visualization for Explainable Face Recognition
Despite the huge success of deep convolutional neural networks in face
recognition (FR) tasks, current methods lack explainability for their
predictions because of their "black-box" nature. In recent years, studies have
been carried out to give an interpretation of the decision of a deep FR system.
However, the affinity between the input facial image and the extracted deep
features has not been explored. This paper contributes to the problem of
explainable face recognition by first conceiving a face reconstruction-based
explanation module, which reveals the correspondence between the deep feature
and the facial regions. To further interpret the decision of an FR model, a
novel visual saliency explanation algorithm has been proposed. It provides
insightful explanation by producing visual saliency maps that represent similar
and dissimilar regions between input faces. A detailed analysis has been
presented for the generated visual explanation to show the effectiveness of the
proposed method
Towards Visual Saliency Explanations of Face Recognition
Deep convolutional neural networks have been pushing the frontier of face
recognition (FR) techniques in the past years. Despite the high accuracy, they
are often criticized for lacking explainability. There has been an increasing
demand for understanding the decision-making process of deep face recognition
systems. Recent studies have investigated using visual saliency maps as an
explanation, but they often lack a discussion and analysis in the context of
face recognition. This paper conceives a new explanation framework for face
recognition. It starts by providing a new definition of the saliency-based
explanation method, which focuses on the decisions made by the deep FR model.
Then, a novel correlation-based RISE algorithm (CorrRISE) is proposed to
produce saliency maps, which reveal both the similar and dissimilar regions of
any given pair of face images. Besides, two evaluation metrics are designed to
measure the performance of general visual saliency explanation methods in face
recognition. Consequently, substantial visual and quantitative results have
shown that the proposed method consistently outperforms other explainable face
recognition approaches
- …